Main use cases: A general language model based on the Transformer architecture. It was explicitly trained for intent recognition and text classification without examples (zero shot).
Input length: 1024 tokens (approx. 768 words)
Languages: predominantly English
Model size: ~407 million parameters
Main use cases: A general language model based on the Transformer architecture. It was explicitly trained for intent recognition and text classification without examples (zero shot).
Input length: 1024 tokens (approx. 768 words)
Languages: predominantly English
Model size: ~407 million parameters
The F1 values for the individual concerns vary greatly (0.26-0.75), but only at a moderate to low level. Only one concern, "change password", shows a balanced recognition pattern of recall and precision.
In all six other cases, the model generalizes too little, so that hardly any false-positive hits are returned (precision at up to 100% - purple), but only a fraction of the actual targets are returned (low recall - gold), and only 15% for "product defective/deficient".
Overall, only the best concern has an F1 value of 0.75, which essentially means that one in four hits is a false positive and one in four targets is not found. Three concerns do not even exceed an F1 value of 0.5. This model is therefore not suitable for recognizing customer concerns in emails.
We varied the following two parameters in the test: We tested different threshold values (0-1) from which a similarity was counted as a hit. The best configuration here showed a value of 0.5. In addition, we tested for each configuration whether and how it affects whether a text can be assigned to several concerns at the same time. This resulted in poorer results in our test series. We did not test different prompt versions, i.e. formulations of the concerns, because the previous results were not promising enough. The results shown opposite represent the best combination of the parameters described.
Our tests for recognizing requests revealed unusually long response times with an average value of 5.44 seconds per email, which is completely unsuitable for real-time applications.
This model was run locally on our servers, so there were no direct costs. In practice, the price depends very much on the setup and the hardware used. In general, larger models are more expensive than smaller ones: BART-large-MNL can be considered rather small with a size of ~407 million parameters.
Due to the extremely long response times and the low recognition quality, we cannot recommend this model if you want to recognize requests from German customers (e-mail).